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Abstract 

Self-stabilizing systems have the ability to converge to a correct behavior when started in 
any configuration. Most of the work done so far in the self-stabilization area assumed either 
communication via shared memory or via FIFO channels. 

This paper is the first to lay the bases for the design of self-stabilizing message passing 
algorithms over unreliable non-FIFO channels. We propose an optimal stabilizing data-link 
layer that emulates a reliable FIFO communication channel over unreliable capacity bounded 
non-FIFO channels. 

1 Introduction 

Self-stabilization [9j [TOl [T7] is one of the most versatile techniques to sustain availability, reliability, 
and serviceability in modern distributed systems. After the occurrence of a catastrophic failure that 
placed the system components in some arbitrary global state, self-stabilization guarantees recovery 
to a correct behavior in finite time without external (i.e. human) intervention. 

As self-stabilization is usually considered a hard property to satisfy, most related works used a 
simple communication model where processes can determine the current state of every neighbors 
(and update their own state accordingly) in an atomic manner (this model is referred to in the 
literature as the state model or systems with central/ distributed daemon). Asynchronous message 
passing is a more realistic way, compared to the state model, for the communication of processes 
in distributed systems. In such settings processes communicate by exchanging messages, where 
sending and receiving message are two separate atomic actions. Transformers for shared memory 
protocols to act in message passing systems, assuming the existence of FIFO channels, have been 
suggested, see e.g. [Ill HOj . At the core of those transformers are the data-link protocols, that 
permit to reliably exchange information between neighboring processes in the message passing 
model. In addition, several self-stabilizing protocols (i.e. [131 12]) that are directly written in the 
message-passing model use an underlying data-link protocol as a building block. 



*Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel. Email: 
dolev@cs.bgu.ac.il. The work started while this author was a visiting professor at LIP6. Research supported 
in part by the ICT Programme of the European Union under contract number FP7-215270 (FRONTS), Deutsche 
Telekom, US Air-Force and Rita Altura Trust Chair in Computer Sciences. 

tUPMC Sorbonne Universites & INRIA, France. 

^UPMC Sorbonne Universites & IUF, France. This work is supported in part by ANR projects SHAMAN, 
ALADDIN, and SPADES 



1 



Related Works. The most studied data-link protocol, namely the alternating bit protocol (ABP), 
was proved to satisfy some stabilization properties [Tjll2tl4]: in any execution of ABP, there exists a 
suffix that satisfies the specification (i.e. the ABP is pseudo- stabilizing). However, the impossibility 
to bound the amount of time before this suffix is reached makes the ABP unsuitable for most tasks. 
In |144 111], Gouda and Multari and Dolev, Israeli, and Moran independently prove that for a wide 
class of problems (including data-link construction) guaranteeing self-stabilization when channels 
have unbounded initial capacity requires some kind of unboundedness in the protocol (either un- 
bounded memory in [14] . the existence of some aperiodic function [I], or access to a probabilistic 
variable [1]). In other words, those approaches require to implement unbounded capacities with 
finite memory, and are thus unlikely to be actually used in real systems. Also, the expected time 
before reaching a stable global state depends on the initial contents of communication channels, 
and is thus unbounded. 

Most recent works took the more realistic approach of assuming channels with bounded initial 
capacity. The token passing protocol in [12] can be used as a self-stabilizing ABP on bounded 
channels and only uses bounded memory. Howell et al. |15] provide another data-link protocol 
over bounded channels with the additional desirable property that the underlying communication 
channels are unreliable (i.e. they may loose or duplicate messages). Later, Varghese [18] presented 
self-stabilizing solutions for a wide class of problems (including data-link) in the same setting using 
only bounded memory. The FIFO ordering is crucial for the stabilization since solution relies on 
the fact that a sequence number that is unique in the system is eventually generated and flushes 
every stale message in transit. A common drawback of all aforementioned self-stabilizing data-link 
solutions is that they assume a FIFO order on messages in the underlying communication channels. 

A notable exception are the protocols provided in [3J that assumed a non-FIFO message passing 
system. The main difference with our approach stands in the fact that their system is enhanced 
with some failure detector whereas we assume a fully asynchronous system. 

Another drawback of previously mentioned self-stabilizing data-link solutions is that they do 
not consider the quantitative impact of faults from the perspective of the upper layer protocol (i.e. 
the layer that actually uses the data-link). Indeed, starting from an arbitrary global state where 
channels may initially contain messages of arbitrary content, being able to bound the number 
of messages sent that are lost or duplicated, or the number of fake messages that are actually 
delivered to the destination is a very important matter. The bound on the number of faulty 
messages delivered by a data-link protocol is an important criteria for the data-link usability in 
larger application, in order to ensure the fault-resiliency of the global protocol stack. To our 
knowledge, only [131 [8] addresses, to some extent, this concern. A snap-stabilizing data-link (and 
global reset) for bounded capacity FIFO channels appears in [15] . In [8] a snap-stabilizing solution 
to the propagation of information with feedback (PIF) problem is presented. The solution can 
be seen as a data-link protocol when reduced to a 2-processes system. Snap-stabilization implies 
that any message that is actually sent by the sender process is eventually received by the receiver 
process, so the number of lost messages is 0. However, we cannot provide bounds on the number 
of duplications of a given message or on the number of ghost messages (that is, messages that are 
not sent by the sender but received by the receiver due to the arbitrary content of communication 
channels in the initial configuration). Concerning the self-stabilizing protocols, only an order of 
magnitude on those numbers can be inferred from the stabilization time (if m messages sent or 
received are required to enter a legitimate global state from any arbitrary initialization, then at 
most m messages could be lost, duplicated, or wrongly delivered). To our knowledge, the question 
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of fault-resilience optimality for data-link protocols has never been raised before, although it has 
important practical consequences. 

Our contribution. Our contribution in this paper is twofold: 

1. We define complexity metrics that are related to the fault-resilience of data-link protocols, 
and present impossibility results in the context of self-stabilization (i.e. the ability to recover 
from any arbitrary initial global state). In particular, we prove that no data-link protocol 
can prevent one message duplication, the delivery of a single fake message, or the reordering 
of a single message. 

2. We present a data-link protocol that is optimal with respect to all presented fault-resilience 
metrics. Moreover, unlike previous self-stabilizing solutions that operate assuming the under- 
lying communication channels preserve FIFO ordering, the channels we consider may indeed 
reorder messages, having some of them remain in the channel for an arbitrary long time. The 
strong fault-resilience property exhibited by our protocol makes it particularly suitable for 
inclusion as a building block in more complex applications. 

Paper organization. The paper is organized as follows. Section [2] proposes the network model 
and hypothesis and then, the data-link problem specification. Section [3] introduces three lower 
bounds results that justify our optimality claim. In Section HI we propose our optimal stabilizing 
data-link protocol altogether with its correctness proof. 

2 Model 

2.1 System Model 

A message-passing distributed system consists of n processes, Po,Pi,P2, ■ ■ ■ ,Pn-i, connected by 
communication links through which messages are sent and received. Two processes connected 
through a communication link are referred in the following as neighboring processes. 

As emphasized in pQ the purpose of a data-link protocol is to reliably transmit messages from 
one end of a communication medium (link) to the other end. Ideally, messages have to arrive 
without duplication or loss and in the order they have been sent. Therefore, we focus in the 
following on the communication between two neighboring processes pi and pj where pi acts as the 
sender and pj acts as the receiver. The communication link between the two processes p% and pj is 
denoted in the following (pi,Pj) and is composed of two virtual directed channels and (J,i). 
The channel (i,j) is used to send messages from pi to pj while the channel is used to send 
acknowledgments from pj to pt. In systems where pj is also message sender, two additional virtual 
channels are used to carry the messages from pj to pi and acknowledgments from pi to pj. 

We assume in the following that the capacity of each directed channel is c packets (i.e. low level 
messages). Note that in the scope of self-stabilization, where the system copes with an arbitrary 
starting configuration, there is no deterministic data-link simulation that uses bounded memory 
when the capacity of channels is unbounded |144 [T2] , 

The channels are non-FIFO and not necessarily reliable (i.e. packets may not follow the FIFO 
order and may be lost). Additionally, their delivery time is unbounded. That is, any non lost 
packet is received in a finite but unbounded time. Each channel (i,j) is weakly fair in the sense 
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that if the sender sends infinitely often a packet on the channel, then the receiver receives this 
packet an infinite number of time. Sending a packet to a channel whose capacity is exhausted (i.e. 
the channel already contains c packets) results in loosing a packet (either a packet already in the 
channel or the packet being sent). 

As we deal with arbitrary initial corruption, a channel may initially contain up to c ghost 
packets (i.e. packets that have never been sent and contain arbitrary content). 

A processor is modeled by a state machine that executes steps. Channels are modeled as sets 
(rather than queues to reflect the non-FIFO order). For example, the c-bounded channel 
(used to send messages from pi to pj) is modeled by a c-sized set denoted by Sjj. 

In each step, a processor changes its local state (i.e. the state of its local memory), and 
executes a single communication operation, which is either a send operation or a receive operation. 
The communication operation changes the state of an attached channel. In case the communication 
operation is a send operation from pi to pj then Sij is a union of in the previous state with the 
sent packet. If the obtained union does not respect the bound \sij\ < c then an arbitrary message in 
the obtained union is deleted. In case the communication operation is a receive operation of a (non 
null) packet m (m must exist in Sji of the previous state), then m is removed from Sji. A receive 
operation by pi from pj may result in a null packet even when the s,-, is not empty, thus allowing 
unbounded delay for any particular packet. Packet losses are modeled by allowing spontaneous 
packet removals from the set. 

A configuration of the system is the product of the local states of processes in the system and 
of their incident channels. 

An execution is a sequence of configurations, E = (C±, C2, ■ • • ) such that Cj, i > 1, is obtained 
from Ci-i when at least one process in the system executes a step. 

2.2 Problem Specification 

The specification we provide in this section is borrowed from [16] but we adapt it to the self- 
stabilizing context. In particular, we introduce the idea to bound the number of lost, duplicated, 
ghost and re-ordered messages by some constants. 

Consider a system of two processors pt and pj . A distributed application needs to send some 
messages from pi to pj . We say that the application layer of pi sends a message when it requests 
the communication protocol to carry this message to pj. This message is delivered to pj when the 
communication protocol releases this message to the application layer of pi. A ghost message is 
a message delivered to pj whereas pi did not send it previously (due to the arbitrary content of 
communication channels in the initial configuration). A duplicated message is a message that is 
delivered several times to pj whereas pi sent it only once. A message is lost when pi sends it but pj 
never delivers it. A message m is reordered when it is delivered to pj before a message m' whereas m 
has been sent after m! by pj. Intuitively, the goal of a Stabilizing Data-Link protocol is to provide a 
communication black box that ensures some properties on the number of lost, duplicated, ghost and 
reordered messages starting from any arbitrary configuration. In the sequel, we formally specify 
the Stabilizing Data-Link problem 

We associate to any execution E the sequence S(E) = momim2 ... of messages sent by pi in 
E and the sequence R(E) = mQm^mj ... of messages delivered to pj in E. Note that we consider 
that all sent messages are different (even if their actual content are identical, we can distinguish 
them as external observer of the system). We introduce the following notations. For any sequence 
W and any integers i and j, W 3 is the prefix of W of length j and W» is the suffix of W such that 
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W = W l ~ 1 Wi. The notation e denotes the empty sequence. For example, R(E)° = e. For any 
message m, we define the m* as the repetition of m an arbitrary number of times (possibly 0). For 
any sequence W, the sequence W* is the result of the application of the * operator to each message 
of W. 

For any non negative integers a, (3, 7, and 5, the (a, {3, 7, J)-Stabilizing Data-Link com- 
munication over c-bounded channels satisfies the following properties starting from an arbitrary 
configuration (with pi and pj being respectively the sender and the receiver) for any execution E: 

• a-Loss: The first a messages sent by pi (in the worst case) may be lost. 

3a < a,Vm € S(E) a ,m G R{E) 

• /3-Duplication: The first (3 messages delivered to pj (in the worst case) may be duplicated 
ones. 

3b < 0, Vm € S(E), |{m- = m|m- G R(E)}\ > 1 => m G R(E) b 

• 7-Creation: The first 7 messages delivered to pj (in the worst case) may be ghost messages. 

3c < 7, Vm G R(E),m £ S{E) => m G i?( J B) c 

• (5-Reordering: The first 5 messages delivered to pj (in the worst case) may be reordered. 

3d<S,R(E) d = S(E)* 

In the following section, we show that it is impossible to perform a (a, 7, <5)-Stabilizing Data- 
Link communication with ft = 0, 7 = 0, or 5 = 0. Then, we can deduce that a (0, 1, 1, 1)-Stabilizing 
Data-Link communication achieves optimal fault-resiliency. The above definitions imply that such 
a communication protocol ensures that R(E) = S(E) or R(E) = m.S(E) (where m is an arbitrary 
message, it may be present in S{E)) for any execution E. In other words, the sequence of received 
messages by pj is identical to the sequence of emitted messages by pi excepted the first delivery in 
the worst case. 

3 Lower Bounds 

In this section, we propose three impossibility results related to the possible values for the param- 
eters (3, 7, and S. We prove that the lower bounds for (3, 7, and 5 parameters is 1. These results 
confirm the claim that the protocol we propose is optimal since it implements a (0, 1, 1, 1)-Stabilizing 
data-link. 

Theorem 1 There exists no (a, (3, 7, 8) -Stabilizing Data-Link communication algorithm over 
bounded channels with 7 = 0. 

Proof: By contradiction, let A be any (a, (3, 0, e))-Stabilizing Data-Link communication algorithm 
over c-bounded channels must have an instruction that delivers messages to the receiver processor. 
As the program counter may be corrupted and channels may contain up to c ghost messages in the 
initial configuration, the receiver processor may execute this instruction during the first step of an 
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execution E. In consequence, the first message of R(E) may be a ghost message m. Hence, we can 
assume that R(E) 1 = m. 

It is possible to construct the execution E such that m ^ S(E). In conclusion, we have: 
3m G R(E),m ^ S(E) A m £ R(E)° = e (recall that e denotes the empty sequence). This 
is contradictory with the O-Creation property of A and implies that 7 > 1 for any (a, /3, 7, 6)- 
Stabilizing Data-Link communication algorithm over c-bounded channels. □ 

Theorem 2 There exists no (a, (3,^,5) -Stabilizing Data-Link communication algorithm over c- 
bounded channels with (3 = 0. 

Proof: By contradiction, let A be any (a, 0, 7, 5)-Stabilizing Data-Link communication algorithm 
over c-bounded channels. Following Theorem [Tj we have 7 > 0. This implies that the first message 
delivered to pj in an execution E by A may be a ghost message m. Hence, we can assume that 
R{Ef = m. 

It is possible to construct the execution E such that the first (real) message sent by pi to pj 
and delivered to pj by A is the same message m. This message has been sent by pi only once 
but has been delivered to pj at least twice. In conclusion, we have: 3m G S(E), \{m[ = m\m\ G 
R(E)}\ > lAm ^ R(E)° = e (recall that e denotes the empty sequence). This is contradictory with 
the O-Duplication property of A and implies that (3 > 1 for any (a, (3, 7, <5)-Stabilizing Data-Link 
communication algorithm over c-bounded channels. □ 

Theorem 3 There exists no (a, (3,^,6) -Stabilizing Data-Link communication algorithm over c- 
bounded channels with 5 = 0. 

Proof: By contradiction, let A be any (a, (3, 7, 0)-Stabilizing Data-Link communication algorithm 
over c-bounded channels. Following Theorem Q3 we have 7 > 0. This implies that the first message 
delivered to pj by A in an execution E may be a ghost message m. Hence, we can assume that 
R(E) 1 = m. 

It is possible to construct the execution E such that S(E) a+2 = m§m\ . . . m a -im a m and 
V? G {0, ...,a},rrii ^ m. As A satisfies the q-Loss and the O-Reordering properties, it follows 
that 3i G {0, . . . , a}, R(E) 1 = mi (otherwise, we have a contradiction since either A lost at least 
a + 1 messages or reordered at least one message, that is contradictory). As 7^ m, we obtain 
a contradiction that shows that 5 > 1 for any (a, /3, 7, 5)-Stabilizing Data-Link communication 
algorithm over c-bounded channels. □ 

In the next section, we present a protocol that is optimal with respect to a, /3, 7, and 6 
parameters. That is, our protocol satisfies the (0, 1, 1, 1)-Stabilizing Data-Link specification. 

4 A (0, 1, 1, 1)-Stabilizing Data-Link Protocol 
4.1 Presentation of the Protocol 

Key ideas of the protocol. The rationale of the protocol consists in adding safety extensions 
to the well-known alternating bit protocol (a.k.a. ABP). The concept used in the design of the 
data-link protocol is to let the sender use a mechanism based on the capacity c of communication 
channels so that the sender can ensure the execution of an operation in the receiver side. More 
precisely, the receiver acts only upon receiving a packet from the sender. The sender may repeatedly 
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Figure 1: General organization of our system. 



send a particular packet, and each time the receiver receives a packet it acknowledges the packet 
arrival. 

First, the receiver can deliver a message only if c + 1 copies of this message have been previously 
received: this ensures that at least one of them is genuine (i.e. was actually sent by the sender). 
Moreover, a message is delivered only if the expected bit alternates with the one of the previously 
received message (similarly to the ABP) in order to ensure that no message is duplicated. Indeed, 
the sender may still send copies of the message with the same alternating bit value until it receives 
a sufficient number of acknowledgments. 

Second, the sender will expect for each message sent at least 3c + 2 acknowledgments with a 
matching alternating bit. As up to c acknowledgments could be ghost, this implies that 2c + 2 
of these acknowledgments were actually sent by the receiver. One such acknowledgment could be 
sent by the received due to bad initialization, c of them could be due to c initial ghost messages in 
the reverse direction, and the remaining c + 1 can only originate from genuine messages from the 
sender, that triggered a delivery at the receiver. 

At this stage, the protocol does not ensure the O-Loss property due to the use of the alternating 
bit. Indeed, if the alternating bit values of the sender and of the receiver are not synchronized at the 
first delivery, the receiver drops the first message. To avoid this message loss, the sender alternates 
between actual messages and synchronization messages. In other words, to send a message m, 
the sender first sends a synchronization message (denoted by < SYNCHRO >) until it receives 
3c + 2 acknowledgments of this synchronization message and then send the actual message m until 
it receives 3c + 2 acknowledgments of m. It follows that only the synchronization message may be 
lost and the actual message is always delivered to the receiver. 
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General organization of the system. Our system is organized as follows. The application 
layer generates messages to be send from p, to pj. To perform this goal, it invokes our stabilizing 
data-link protocol. Furthermore, this layer invokes procedures provided by the physical channel. 

In more details, the stabilizing data-link protocol is composed of two functions: Send (which is 
executed on the sender side) and Receive (which is executed on the receiver side) . When the appli- 
cation layer on the sender side wants to send a message m, it invokes Send(m). Send procedure is 
blocking, that is if Send is already in execution, the application layer waits its termination whereas 
the Receive function is always executed on the receiver side. When the Receive function has a 
message to deliver at the application layer on the receiver side, it executes Deliver Message (m) 
that transmits m to the application layer. When the Receive function wants to discard a syn- 
chronization message (since this kind of messages is useless to the application layer), it uses the 
DropMessage function that only deletes the message. Finally, each delivered message is acknowl- 
edged to the application layer on the sender side by Deliver Ack(m). 

Functions Send and Receive must interact with the physical channel in order to exchange 
messages. For this, we assume that the channel provides two operations. First, it provides 
an operation to send a message or an acknowledgment, respectively SendPacket(m,a6) and 
SendPacket(ack,(m,a6)) where m is the message and ab its alternating bit value. This oper- 
ation puts m (or its acknowledgment) in the channel if it is possible (if this operation leads to 
more than c messages in the channel, one of them is arbitrarily deleted). Second, it provides an 
operation to receive a message or an acknowledgment, respectively ReceivePacket (m,ab) and 
ReceivePacket(ack, (m,ab)) where m is the message and ab its alternating bit value. On the 
receiver side, ReceivePacket (m,ab) is executed when the channel has a message to deliver and 
when Receive is not in execution. It sets then m and ab to actual values of the delivered message. 
In other words, the reception (for the data-link protocol) on the receiver side is message-driven. 
On the sender side, ReceivePacket (ack, (m,a6)) is executed by the data- link protocol and does 
polling. That is it checks whether the first waiting message in the channel (if any) matches with 
an acknowledgment of the parameter (m,ab). It returns true if this is the case, false otherwise. 
In any case, the first waiting message (if any) is deleted from the channel. The architecture of our 
system is summarized in Figure [TJ 

Detailed presentation of the protocol. Our (0, 1, 1, l)-stabilizing data-link protocol SVC is 
presented as Figure El In the following, we provide details about the two functions Send and 
Receive. 

The function Send takes a message m as parameter and stores the current alternating bit value 
in the variable ab. First, it alternates the value of ab (line 01) before sending a synchronization 
message (line 02) using an auxiliary function SendMessage. Then, lines 03 and 04 repeat these 
instructions with the message m. Once the last invocation of SendMessage returns, it delivers to 
the application layer the acknowledgment of m using Deliver Ack. Now, let us describe the auxil- 
iary function SendMessage. This function repeatedly (while loop of line 02) sends its parameter 
message m (line 03) until receiving 3c + 2 acknowledgment for this message (line 04-05). 

The function Receive takes no parameter and uses two variables. The first one is the alternating 
bit value of the last delivered or dropped message stored in last-delivered and the second one is a 
queue Q that stores the number of receptions of at most c + 1 different messages. Each element of 
this queue is a 3-tuple (m, ab, count), where m is a message, ab is an alternating bit value, and count 
is an integer denoting the number of packets (m, ab) received for the corresponding m and ab since 
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the last DeliverMessage or DropMessage occurred. The queue [] operator takes a message m 
and a boolean b as operands, and either enqueues (m, ab, 0) (if (m, ab, *) is not present in Q, then if 
the queue contained c + 1 elements, the last element of the queue is dequeued) or returns a pointer 
to the count value associated to m and ab in Q. Any time a tuple value is changed in the queue, 
this tuple is promoted at the top of the queue (in order to keep in memory the c + 1 latest received 
messages), and the size of the queue does not change. The _L assignment to a queue Q denotes the 
fact that Q is emptied. At each reception of a message (m, ab) (line 01), the corresponding entry in 
the queue is updated (or created if needed) by line 02. If pj already received c + 1 copies of m since 
the last DeliverMessage or DropMessage occurred (test on line 03) then the queue is emptied 
(line 10). Moreover, if the alternating bit value of the message is different from last-delivered (test 
on line 04), then the message is either delivered with DeliverMessage (line 06) or dropped with 
DropMessage (line 08) depending if it is a synchronization message or not (test on line 05). Then, 
the last_delivered value is updated by line 09. Finally, in any case, the message is acknowledged 
to the sender with line 11. 

4.2 Correctness Proof 

In this section, let pi and pj be two neighboring nodes that execute SVC, pi being the sender and 
Pj the receiver. Let E = (C\, C2, ■ ■ ■) be an execution starting from an arbitrary configuration. 

We say that a message m' is processed by pj when pj executes DeliverMessage (m 1 ) (line 06 
of Receive function) if m' is a normal message or when pj executes DropMessage (m') (line 08 
of Receive function) if m' is a < SYNCHRO > message. 

First, we need two preliminaries results related to the result of the execution of the procedure 
SendMessage by pi depending on the configuration in which pi starts to execute this procedure. 

Lemma 1 When pi starts to execute SendMessage (777/ , ab) in a configuration where ab 7^ last- 
delivered, the message m' (either a < SYNCHRO > message or a normal message) and every 
message parameter to a subsequent invocation of SendMessage is processed by pj in a finite time. 

Proof: Consider a configuration Ck where ab 7^ last-delivered. Assume that pi starts to execute 
SendMessage (m 1 , ab) in Ck- By contradiction, assume vn! is never processed by pj in the remain- 
der of E. That is, pj never executes lines 06 or 08 in the Receive procedure. In turn, tests on 
lines 03 or 04 never evaluate to true simultaneously. 

As last -delivered 7^ ab in Ck and last-delivered may change only when m! is processed (line 
09), we know that the test on line 04 is always true (since m is never processed by assumption). 

This implies that Q[m',ab] > c + 1 never evaluates to true (test on line 03). This implies that 
the sender stops sending (m', ab) before the {m! , ab) counter reached c+ 1, which is impossible. The 
reason is as follows. In order to stop sending the same message, pi must get 3c + 2 acknowledgments 
with the expected content {ack, {m! , ab)). If such 3c + 2 acknowledgments are indeed received, this 
implies that the receiver issued at least 2c + 2 of those acknowledgments, and thus received 2c + 2 
packets (m',ab). Consider the first such packet (m',ab) received by pj. If there is no reset of p/s 
queue following this packet, the head of the queue now contains an entry (m',ab,*) that can not 
be deleted until the receiver resets the entire queue. Indeed, at most c packets are initially present 
in the receiver's input channel, that can create at most c entries in the queue. Since the queue is of 
size c + 1, the (m', ab, *) tuple remains. Now, if the receiver sends c + 1 packets (ack, (m', ab)), it 
implies that the receiver's queue for entry (m' , ab, *) was incremented c+ 1 times, which invalidates 
the assumption. It follows that m! is processed in a finite time. 
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Send 

input: 

m: message to be sent 
persistent variable: 

ab: boolean that states the current alternating bit value 
01: ab := -<ab 

02: SendMessage (< SYNCHRO >,ab) 

03: ab := —^ab 

04: SendMessage (m, ab) 

05: DeliverAck (m) 



SendMessage 

input: 

m! : message to be sent 

ab: boolean that states the alternating bit value associated to m 
variable: 

ack: integer denoting the number of acknowledgments received for the current ab value 

01: acfc:=0 

02: while ack < 3c + 2 

03: SendPacket (ra',ab) 

04: if ReceivePacket (ack, (ra' , ab)) 

05: ack := ack + 1; 



Receive 

persistent variables: 

lastjielivered: boolean that states the alternating bit value of the last delivered message 
Q: queue of size c + 1 of 3-tuples (m, ab, count), where m is a message, ab is an alternating 
bit value, and count is an integer denoting the number of packets (m, ab) received for the 
corresponding m and ab since the last DeliverMessage or DropMessage occurred. 



01 
02 
03 
04 
05 
06 
07 
08 
09 
10 
11 



upon ReceivePacket (m, ab) 

Q[m, ab] := min(Q[m, ab] + 1, c + 1) 
if Q[m, ab] > c + 1 then 
if last-delivered ^ ab then 
if m^< SYNCHRO > then 

DeliverMessage (m) 
else 

DropMessage (m) 

last-delivered := ab 
Q ■■= ± 
SendPacket (ack, (m, ab)) 



Figure 2: SVC, a (0, 1, 1, 1)-Stabilizing Data-Link protocol 



Note that after the processing of m', ab and last-delivered have the same value with the 
execution of the line 09 of Receive procedure. Hence the next invocation of the SendMessage 
primitive by pi will make the values ab and last-delivered different. Applying the above reasoning, 
the lemma follows. □ 

Lemma 2 When pi starts to execute SendMessage (m 1 , ab) in a configuration where ab = last- 
delivered, only m' (either a < SYNCHRO > message or a normal message) is not processed by 
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Proof: Consider a configuration Ck where ab = last-delivered. Assume that pi starts to execute 
SendMessage (m',ab) in Ck- 

Since the test in the line 04 of the Receive procedure evaluates to false, the processing of 
m! is not executed. However, since pi keeps sending ml and pj acknowledges these packets the 
SendMessage procedure returns. Note that pi executes line 01 or 03 of the Send procedure 
before the next invocation of SendMessage procedure. 

It follows that the system reaches in a finite time a configuration where ab ^ lastjdelivered. 
Then, LemmaQ] implies that every message that is parameter of subsequent invocations of SendMes- 
sage is eventually processed by pj . □ 

Now, we can prove that SVC satisfies the four properties of the specification (see Section I2.2|) 
starting from any configuration. 

Lemma 3 SVC satisfies the 0-Loss property. 

Proof: Assume that pi has to send a message m to pj starting from an arbitrary configuration. 
Note that proofs of Lemmas [1] and [2] imply that any invocation of the Send procedure eventually 
ends. This implies in turn that pi starts to execute Send(m) in a finite time. 

Then, pi invokes first SendMessage with a < SYNCHRO > message as parameter (see 
line 02 of the Send procedure). Note that this < SYNCHRO > message may be lost if ab = 
last -delivered when pi starts to execute SendMessage by Lemma [2J 

Then, following Lemma[2]that we have ab ^ last-delivered whenp^ starts to execute SendMes- 
sage with m as parameter (see line 04 of the Send procedure) since it has executed line 03 of the 
Send procedure. By Lemma [H it follows that m is eventually processed by pj. As m is a normal 
message, this implies by definition that m is delivered to pj in a finite time. 

As this result holds whatever the state of the system when pi requests to send m, we obtain 
that Vm G S(E),m G R(E). It is sufficient to observe that S(E) = S(E)q to obtain the result. □ 

Lemma 4 SVC satisfies the 1-Duplication property. 

Proof: By contradiction, assume that there exists an execution E of SVC such that V6 < 1, 3m G 
S(E), \{ml; = m\ml; G R(E)}\ > 1 A m ^ R(E) b . In particular, this property is true for 6 = 1. 
Hence, 3m G S(E), \{m[ = m\m! i G R(E)}\ > 1 A m £ R(E) 1 . In other words, there exists in E a 
message m sent by pi delivered several times to pj. Moreover m is not the first message received 
by Pj- 

This implies that the line 06 in the Receive procedure is executed several times for the message 
m. It is impossible and the reason is the following. After the first delivery of m the receiver empties 
the queue and makes last-delivered = ab (see proof of Lemma [2|). Note that pi modifies ab only 
when it stops to send m. Even if pi keeps invoking SendPacket (m, ab) until it receives the 3c + 2 
acknowledgments, none of these messages will be delivered since for each of them the test in line 
04 in the Receive procedure returns false. 

This contradiction implies that only the first message received by pj may be duplicate. The 
lemma follows. □ 

Lemma 5 ST>C satisfies the 1- Creation property. 

Proof: By contradiction, assume that there exists an execution E of SVC such that Vc < 1, 3m G 
R(E),m ^ S(E) A m £ R(E) C . In particular, this property is true for c = 1. Hence, 3m G 
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S(E),m (ji S(E) A m ^ R(E) 1 . In other words, there exists in E a message m not sent by pi but 
delivered to j?j . Moreover m is not the first message received by pj . 

Initially the channel may contain at most c ghosts messages. In the worst case, the 

receiver's queue also contains an entry for each of the ghost with the counters initialized to c or 
c + 1. 

Let (g, ab) be the first ghost message received by pj with alternated bit set to ab. Let us study 
the two possible cases. First, assume that ab ^ last-delivered. Then pj delivers g (line 06 of 
Receive procedure) and empties the queue (line 10 of Receive procedure). Second, assume that 
ab = last-delivered. Then pj does not deliver g (due to the test of line 04 of Receive procedure) 
but it empties the queue (line 10 of Receive procedure). 

In both cases, there is at most one ghost message delivered to pj and the queue has been 
emptied. In turn, it remains now at most c — 1 ghosts messages in the channel (i,j). If one of 
them is received by pj (after an invocation of ReceivePacket), its associated counter cannot reach 
the value c + 1 (unless pi starts to send the same message but in this case, it is no longer a ghost 
message) since there are at most c — 1 copies of the same message. Consequently, none of the c — 1 
remaining ghost messages can be delivered, that contradicts the construction of m and proves the 
result. □ 

Lemma 6 SVC satisfies the 1 -Reordering property. 

Proof: Following Lemma SVC delivers at most one ghost message to pj in E. Let us consider 
the two following possible cases. 

Case 1: SVC delivers no ghost message to pj in E. 

According to Lemmas [3] and HI any message sent from pi is delivered to pj exactly once in 
this case. Now, observe that any message is delivered to pj between the beginning and the 
end of the corresponding execution of the procedure Send by pi. Indeed, the message is 
delivered to pj when it receives the (c+ l)-th copy of the message whereas pi waits to receive 
the (3c + 2)-th acknowledgment of the message to stop sending it (see proof of Lemmas [1] 
and [2]). Since the Send procedure is blocking for pi, R(E)q = Re = Se for any execution E 
where SVC delivers no ghost message to pj. Hence, 3d = < 1, R{E)d = Se- 

Case 2: ST>C delivers one ghost message to pj in E. 

Assume that the ghost message delivered by ST>C is m. Lemma [5] allows us to state that m 
is the first message delivered to pj. Then, a similar reasoning to the one of case 1 allows us 
to conclude that R(E) = m.S(E) for any execution E where SVC delivers one ghost message 
m to pj and then, R(E)i = Se- Hence, 3d = 1 < 1, R(E),i = Se- 

In both cases, we show that SVC satisfies the 1-Reordering property. □ 
Now, we can conclude on the following corollary of Lemmas El HJ [5] and [6l 

Theorem 4 SVC satisfies the (0,1, 1,1) -Stabilizing Data-Link Communication specification. 

5 Conclusion 

In this paper, we focused on stabilizing data-link protocols over channels of bounded capacity 
c. First, we introduced some measures for fault-resilience following the specification presented 
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in |16] that is suitable to the self-stabilizing setting. Then, we proved lowers bounds on these 
parameters. Finally, we proposed a stabilizing data-link protocol that emulates FIFO reliable links 
over unreliable bounded non-FIFO communication environment with an optimal fault-resilience. 
To achieve this optimal fault-resilience, our protocol sends 6c + 4 packets (and their corresponding 
acknowledgements) to deliver one message to the application layer. 

Some interesting open questions follow. Is it possible to achieve optimal fault-resilience with a 
(significantly) lower message complexity for a given channel capacity c? Recently, some works on 
snap-stabilizing point-to-point communication (7J EJ [5] across multiples hops have been presented 
in a coarse grained communication model. Is it possible to extend these results to the more realistic 
message passing model using our Stabilizing Data-link as a communication black box? If so, is it 
possible to provide optimal fault resilience as in the one hop case? 
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